knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
library(ggthemes)
library(cowplot)
library(kableExtra)

Abstract

Call me Ishmael. Some years ago -- never mind how long precisely -- having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people's hats off -- then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me.

Introduction to the Package

This is our Package

One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin. He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections. The bedding was hardly able to cover it and seemed ready to slide off any moment. His many legs, pitifully thin compared with the size of the rest of him, waved about helplessly as he looked.

Motivating the Use of Bayesian Statistics

She emptied her mind of all thought of herself, of her children, of all anger, of all rebellion, of all questions. Then with a profound and deeply willed desire to believe, to be heard, as she had done every day since the murder of Carlo Rizzi, she said the necessary prayers for the soul of Michael Corleone.

Motivating the Use of Nonparametric Statistics

In the criminal justice system, the people are represented by two separate yet equally important groups. The police who investigate crime and the district attorneys who prosecute the offenders. These are their stories.

The kind of problems we will tackle in this paper

In 1972, a crack commando unit was sent to prison by a military court for a crime they didn't commit. These men promptly escaped from a maximum security stockade to the Los Angeles underground. Today, still wanted by the government they survive as soldiers of fortune. If you have a problem, if no one else can help, and if you can find them....maybe you can hire The A-Team.

Simple Betas

There was only one catch and that was Catch-22, which specified that a concern for one's own safety in the face of dangers that were real and immediate was the process of a rational mind. Orr was crazy and could be grounded. All he had to do was ask; and as soon as he did, he would no longer be crazy and would have to fly more missions. Orr would be crazy to fly more missions and sane if he didn't, but if he was sane, he had to fly them. If he flew them, he was crazy and didn't have to; but if he didn't want to, he was sane and had to. Yossarian was moved very deeply by the absolute simplicity of this clause of Catch-22 and let out a respectful whistle. **"That's some catch, that Catch-22," he observed. "It's the best there is," Doc Daneeka agreed.

Correlational Problems

Kendall $\tau$

Sample data from book

x<-c(2.5, 2.5, 2.5,  2.5, 7.5, 7.5, 7.5, 7.5,  7.5,  7.5, 12.5, 12.5, 12.5, 12.5)
y<-c(3.5, 9.0, 9.0, 13.0, 3.5, 3.5, 9.0, 9.0, 13.0, 13.0,  3.5,  3.5,  3.5,  9.0)

$\tau_b$

cor.test(x, y, method="kendall")
tau.b<-unname(cor.test(x, y, method="kendall")$estimate)
tau.b

Let's talk about ties here

Corrected $\tau$

Tau<-function(x, y){
  xy<-data.frame(x,y)                               #append x and y vectors
  x_ties<-table(x)[table(x)>1]                      #List of x ties
  y_ties<-table(y)[table(y)>1]                      #List of y of ties
  xy_ties<-table(xy)[table(xy)>1]                   #List ofxy ties
  t_xi<-unname(table(x)[table(x)>1])                #Count T_x sizes of ties
  t_yi<-unname(table(y)[table(y)>1])                #Count T_y sizes of ties
  t_xyi<-unname(table(xy)[table(xy)>1])             #Count T_xy sizes of ties
  Tx<-sum((t_xi*(t_xi-1))/2)                        #Calculate Tx
  Ty<-sum((t_yi*(t_yi-1))/2)                        #Calculate Ty
  Txy<-sum(t_xyi*(t_xyi-1)/2)                       #Calculate Txy
  n<-length(x)
  n_max<-n*(n-1)/2-Tx-Ty+Txy
  xy_ranks<-data.frame(xrank=rank(xy$x, ties.method="average"),
                       yrank=rank(xy$y, ties.method="average"))
  xy_c<-xy_ranks[order(x, -y),]       # for n_c, sort on ascending x then descending y
  xy$concordant<-rep(NA, nrow(xy))
  for (i in 1:nrow(xy-1)){
    xy$concordant[i]<-sum(xy_c$yrank[(i+1):length(xy_c$yrank)]>xy_c$yrank[i])
  }
  nc<-sum(xy$concordant, na.rm=TRUE)
  nd<-n_max-nc
  Tau<-(nc-nd)/n_max
  list(Tau=Tau,
       "Tied x Values"=x_ties,
       "Tied y Values"=y_ties,
       "Tied xy Values"=xy_ties,
       nc=nc,
       nd=nd,
       nmax=n_max)

}

Goodness-of-Fit Using $\phi_c$

Phi<-function(x, y, a.prior=1, b.prior=1, hdi.width=0.95){
  xy<-data.frame(x,y)                               #append x and y vectors
  t_xi<-unname(table(x)[table(x)>1])                #Counting T_x sizes of ties
  t_yi<-unname(table(y)[table(y)>1])                #Counting T_y sizes of ties
  Tx<-sum((t_xi*(t_xi-1))/2)                        #Calculating Tx
  Ty<-sum((t_yi*(t_yi-1))/2)                        #Calculating Ty
  t_xyi<-unname(table(xy)[table(xy)>1])             #Calculating txyi
  Txy<-sum(t_xyi*(t_xyi-1)/2)                       #Calculating Txy
  n<-length(x)
  n_max<-n*(n-1)/2-Tx-Ty+Txy
  xy_ranks<-data.frame(xrank=rank(xy$x, ties.method="average"),
                       yrank=rank(xy$y, ties.method="average"))
  xy_c<-xy_ranks[order(x, -y),]       # for n_c, sort on ascending x then descending y
  xy$concordant<-rep(NA, nrow(xy))
  for (i in 1:nrow(xy-1)){
    xy$concordant[i]<-sum(xy_c$yrank[(i+1):length(xy_c$yrank)]>xy_c$yrank[i])
  }
  nc<-sum(xy$concordant, na.rm=TRUE)
  nd<-n_max-nc
  Tau<-(nc-nd)/n_max
  p_c<-(Tau+1)/2
  a.post<-a.prior+nc
  b.post<-b.prior+nd
  post.median<-qbeta(0.5, a.post, b.post)
  post.hdi.lower<-qbeta((1-hdi.width)/2, a.post, b.post)
  post.hdi.upper<-qbeta(1-(1-hdi.width)/2, a.post, b.post)

  list(tau=Tau,
       sample.p=p_c,
       alpha=a.post,
       beta=b.post,
       post.median=post.median,
       post.hdi.lower=post.hdi.lower,
       post.hdi.upper=post.hdi.upper)
}

$\phi*$

Phi_star<-function(x, y, a.prior=1, b.prior=1, hdi.width=0.95, fitting.parameters){
  m<-fitting.parameters
  xy<-data.frame(x,y)                               #append x and y vectors
  t_xi<-unname(table(x)[table(x)>1])                #Counting T_x sizes of ties
  t_yi<-unname(table(y)[table(y)>1])                #Counting T_y sizes of ties
  Tx<-sum((t_xi*(t_xi-1))/2)                        #Calculating Tx
  Ty<-sum((t_yi*(t_yi-1))/2)                        #Calculating Ty
  t_xyi<-unname(table(xy)[table(xy)>1])             #Calculating txyi
  Txy<-sum(t_xyi*(t_xyi-1)/2)                       #Calculating Txy
  n<-length(x)
  n_max<-n*(n-1)/2-Tx-Ty+Txy
  xy_ranks<-data.frame(xrank=rank(xy$x, ties.method="average"),
                       yrank=rank(xy$y, ties.method="average"))
  xy_c<-xy_ranks[order(x, -y),]       # for n_c, sort on ascending x then descending y
  xy$concordant<-rep(NA, nrow(xy))
  for (i in 1:nrow(xy-1)){
    xy$concordant[i]<-sum(xy_c$yrank[(i+1):length(xy_c$yrank)]>xy_c$yrank[i])
  }
  nc<-sum(xy$concordant, na.rm=TRUE)
  nd<-n_max-nc
  Tau<-(nc-nd)/n_max
  Tau_star<-(nc_star-nd)/(((n*(n-1))/2)-Tx-Ty+Txy)  #adjusted Tau
  p_c_star<-(Tau_star+1)/2                          #adjusted p_c (maybe unnecessary?)
  a.post_star<-a.prior+nc_star                      #adjusted a'
  b.post_star<-b.prior+nd                           #b' is unchanged
  post.median_star<-qbeta(0.5, a.post_star, b.post_star)
  post.hdi_star.lower<-qbeta((1-hdi.width)/2, a.post_star, b.post_star)
  post.hdi_star.upper<-qbeta(1-(1-hdi.width)/2, a.post_star, b.post_star)

  list(tau=Tau,
       Tau_star=Tau_star,
       sample.p=p_c_star,
       post.median=post.median_star,
       post.hdi_star.lower=post.hdi.lower,
       post.hdi_star.upper=post.hdi.upper)
}

Goodman-Kruskal $\gamma$

Function to format two (raw) vectors as a gamma table

Vec_to_table<-function(x, y, quantiles_x, quantiles_y){
  x_cut<-cut(x, quantiles_x)
  y_cut<-cut(y, quantiles_y)
  return(table(x_cut, y_cut))
}

Function to format a gamma table to two (grouped) vectors (for use in the $\phi_c$ function)

Table_to_vec<-function(table){
  x<-rep(1:nrow(table), unname(rowSums(table)))
  y<-rep(as.vector(t(col(samp_table2))), as.vector(t(samp_table2)))
  list(x=x,
       y=y)
}

Examples for documentation

x<-c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5)
y<-c(1, 1, 4, 4, 5, 1, 2, 3, 5, 1, 2, 3, 3, 3, 3, 5, 2, 4, 4, 4, 5, 5, 3, 4, 4, 5, 5)

Gamma_Concordance(x, y, quantiles_x=5, quantiles_y=5)

t-test Analogues

Wilcoxon

Small Sample/Large Sample

Discrete Wilcoxon Function

The input is $n$ and $t_+$.

#Wilcox_small<-function(n, tplus, intervals=200, samples=20000)
intervals=200
samples=20000
n=20
tplus=151
fphi=seq(1, 200, 1)

#for (i in 1:200){
#  fphi[i]=0.0
#}

fphi<-rep(0.0, intervals)

for (j in 1:intervals){
  phi=1/(2*intervals)+(j-1)*(1/intervals)
  for (k in 1:samples){
#    tz=0.0
#    for (L in 1:n){
#      tz=tz+(L*rbinom(1, 1, phi))
#    }
    tz=sum((1:n)*rbinom(n, 1, phi))

       if(tz==tplus) {
      fphi[j]=fphi[j]+1.0
    } else{
      fphi[j]=fphi[j]
    }
  }
}
tot=sum(fphi)
phipost=fphi/tot
phiv=1/(2*intervals)+(j-1)*(1/intervals)
phibar=sum(phiv*phipost)
cphi=cumsum(phipost)
prH1=1-cphi[round(intervals/2)]

Large-Sample Wilcoxon

Haven't done it yet.

Mann-Whitney $U$

Small Sample/Large Sample Considerations

## Example 7.1 (p. 342)

Diet_A<-c(900, 1347, 1793, 2565, 2697, 3102, 3152, 3240, 3284, 3913)
Diet_B<-c( 11,   25,  948, 1396, 1487, 1946, 2343, 2483, 3010, 3071)
## Example 7.2 (pp. 342-343)

C_fibers<-c(1.312, 1.314, 1.479, 1.552, 1.700, 1.803, 1.861, 1.865, 1.944, 1.958,
            1.966, 1.997, 2.006, 2.021, 2.027, 2.055, 2.063, 2.098, 2.140, 2.179,
            2.224, 2.240, 2.253, 2.270, 2.272, 2.274, 2.301, 2.301, 2.359, 2.392,
            2.382, 2.426, 2.434, 2.435, 2.478, 2.490, 2.511, 2.514, 2.535, 2.554,
            2.566, 2.570, 2.586, 2.629, 2.633, 2.642, 2.648, 2.684, 2.697, 2.726,
            2.770, 2.773, 2.800, 2.809, 2.818, 2.821, 2.848, 2.880, 2.809, 2.818,
            2.821, 2.848, 2.880, 2.954, 3.012, 3.067, 3.084, 3.090, 3.096, 3.128,
            3.233, 3.433, 3.585, 3.585)

E_fibers<-c(1.901, 2.213, 2.203, 2.228, 2.257, 2.350, 2.361, 2.396, 2.397, 2.445,
            2.454, 2.474, 2.518, 2.522, 2.525, 2.532, 2.575, 2.614, 2.616, 2.618,
            2.624, 2.659, 2.675, 2.738, 2.740, 2.856, 2.917, 2.928, 2.937, 2.937,
            2.977, 2.996, 3.030, 3.125, 3.139, 3.145, 3.220, 3.224, 3.235, 3.243,
            3.264, 3.272, 3.294, 3.332, 3.346, 3.377, 3.408, 3.435, 3.493, 3.501,
            3.537, 3.554, 3.562, 3.628, 3.852, 3.871, 3.886, 3.971, 4.024, 4.027,
            4.225, 4.395, 5.020)

Calculating $U_E$ and $U_C$

# Function based on code from p. 346

Mann_Whitney_U_Stats<-function(E, C){
  UE_vector<-rep(NA, length(E)) # UE counter
  UC_vector<-rep(NA, length(C)) # UC counter
  for (i in 1:length(E)){
    UE_vector[i]<-sum(E[i]>C)
  }
  for (j in 1:length(C)){
    UC_vector[j]<-sum(C[j]>E)
  }

  list(UE=sum(UE_vector),
       UC=sum(UC_vector))
}

Mann-Whitney Discrete Function 1

This version takes as input $U_E$, $n_E$, $n_C$, the desired number of samples (default = 10000) and the desired number of intervals (default = 200).

#UE=21
#nE=5
#nC=5

mann_whitney_discrete_approx_1<-function(UE, nE, nC, n_samples = 10000, n_intervals = 200){
  XE=seq(1, nE, 1)
  XC=seq(1, nC, 1)
  fomega<-rep(0.0, n_intervals)

  for (j in 1:n_intervals){
    omega=0.5/n_intervals+(j-1)*(1/n_intervals)
    komega=(1-omega)/omega

    for (k in 1:n_samples){
      Uz<-rep(NA, nE)
      XE<-rexp(nE, rate=komega)
      XC<-rexp(nC, rate = 1)

      for (i in 1:nE){
        Uz[i]<-sum(XE[i]>XC)
      }

      if(sum(Uz) == UE) {fomega[j] = fomega[j]+1} else {}
    }
  }


tot=sum(fomega)
omegapost=fomega/tot
omegav=seq(0.5/n_intervals, 1-(0.5/n_intervals), 1/n_intervals)
omegabar=sum(omegapost*omegav)
comega=cumsum(omegapost)
prH1=1-comega[n_intervals/2]

list(omegapost=omegapost,
     omegabar=omegabar,
     comega=comega,
     prH1=prH1)

}

Mann-Whitney Discrete Function 2

This version takes as input $E$, $C$, the desired number of samples (default = 10000) and the desired number of intervals (default = 200).

mann_whitney_discrete_approx_2<-function(E, C, n_samples=10000, n_intervals=200){ #Written to take in two vectors of observed data, default is 200 intervals
  UE_vector<-rep(NA, length(E)) # UE counter
  UC_vector<-rep(NA, length(C)) # UC counter
  for (i in 1:length(E)){
    UE_vector[i]<-sum(E[i]>C)
  }
  for (j in 1:length(C)){
    UC_vector[j]<-sum(C[j]>E)
  }
  UE=sum(UE_vector)
  UC=sum(UC_vector)

  nE<-length(E)
  nC<-length(C)

  XE=seq(1, length(E), 1)
  XC=seq(1, length(C), 1)
  fomega<-rep(0.0, n_intervals)

  for (j in 1:n_intervals){
    omega=0.5/n_intervals+(j-1)*(1/n_intervals)
    komega=(1-omega)/omega

    for (k in 1:n_samples){
      Uz<-rep(NA, length(E))
      XE<-rexp(length(E), rate=komega)
      XC<-rexp(length(C), rate = 1)

      for (i in 1:length(E)){
        Uz[i]<-sum(XE[i]>XC)
      }

      if(sum(Uz) == UE) {fomega[j] = fomega[j]+1} else {}
    }
  }


tot=sum(fomega)
omegapost=fomega/tot
omegav=seq(0.5/n_intervals, 1-(0.5/n_intervals), 1/n_intervals)
omegabar=sum(omegapost*omegav)
comega=cumsum(omegapost)
prH1=1-comega[n_intervals/2]

list(omegapost=omegapost,
     omegabar=omegabar,
     comega=comega,
     prH1=prH1)

}

Mann-Whitney Lagrange Function 1

This version takes as input $n_E$, $n_C$, and $U_E$.

#UE = 300
#nE = 20
#nC = 20

mann_whitney_lagrange_approx_1<-function(nE, nC, UE){
  xs=UE/(nE*nC)
  if (xs >= 0.5){
    x = xs
  } else {
    x = 1-xs
  }

  nH=(2*nE*nC)/(nE+nC)

  y5=(nH^1.1489)/(0.4792+nH^1.1489)

  w4=0.8-(1/(1+1.833*nH))
  w3=0.6-(1/(1+2.111*nH))
  w2=0.4-(1/(1+2.520*nH))
  w1=0.2-(1/(1+4.813*nH))

  y4=(y5*w4)+(1-w4)*0.5
  y3=(y5*w3)+(1-w3)*0.5
  y2=(y5*w2)+(1-w2)*0.5
  y1=(y5*w1)+(1-w1)*0.5

  Y=c(0.5, y1, y2, y3, y4, y5)

  La0=252-(1627*x)+(12500*x^2-15875*x^3+10000*x^4-2500*x^5)/3
  La1=-1050+(42775*x/6-38075*0.5*x^2+75125*x^3-48750*x^4+12500*x^5)/3
  La2=1800-12650*x+(104800*x^2-142250*x^3+95000*x^4-25000*x^5)/3
  La3=-1575+11350*x+(-96575*x^2+134750*x^3+92500*x^4+25000*x^5)/3
  La4=700+14900*x^2+15000*x^4-(15424*x+63875*x^3+12500*x^5)/4
  La5=-126+1879*0.5*x+(-16625*0.5*x^2+12125*x^3-8750*x^4+2500*x^5)/3

  LA<-c(La0, La1, La2, La3, La4, La5)

  ombar=sum(Y*LA)

  absum=nH*(1.028+0.75*x)+2

  if (xs>=0.5){
    a=ombar*absum
    b=(1-ombar)*absum
    omegabar=ombar
  } else {
    a=(1-ombar)*absum
    b=ombar*absum
    omegabar=1-ombar
  }

  na=a-1
  nb=b-1

  list(na=na,
       nb=nb,
       omegabar=omegabar)
}

Mann-Whitney Lagrange Function 2

This version takes as input $E$ and $C$.

mann_whitney_lagrange_approx_2<-function(E, C){
  UE_vector<-rep(NA, length(E)) # UE counter
  UC_vector<-rep(NA, length(C)) # UC counter
  for (i in 1:length(E)){
    UE_vector[i]<-sum(E[i]>C)
  }
  for (j in 1:length(C)){
    UC_vector[j]<-sum(C[j]>E)
  }
  UE=sum(UE_vector)
  UC=sum(UC_vector)

  nE<-length(E)
  nC<-length(C)

  xs=UE/(nE*nC)
  if (xs >= 0.5){
    x = xs
  } else {
    x = 1-xs
  }

  nH=(2*nE*nC)/(nE+nC)

  y5=(nH^1.1489)/(0.4792+nH^1.1489)

  w4=0.8-(1/(1+1.833*nH))
  w3=0.6-(1/(1+2.111*nH))
  w2=0.4-(1/(1+2.520*nH))
  w1=0.2-(1/(1+4.813*nH))

  y4=(y5*w4)+(1-w4)*0.5
  y3=(y5*w3)+(1-w3)*0.5
  y2=(y5*w2)+(1-w2)*0.5
  y1=(y5*w1)+(1-w1)*0.5

  Y=c(0.5, y1, y2, y3, y4, y5)

  La0=252-(1627*x)+(12500*x^2-15875*x^3+10000*x^4-2500*x^5)/3
  La1=-1050+(42775*x/6-38075*0.5*x^2+75125*x^3-48750*x^4+12500*x^5)/3
  La2=1800-12650*x+(104800*x^2-142250*x^3+95000*x^4-25000*x^5)/3
  La3=-1575+11350*x+(-96575*x^2+134750*x^3+92500*x^4+25000*x^5)/3
  La4=700+14900*x^2+15000*x^4-(15424*x+63875*x^3+12500*x^5)/4
  La5=-126+1879*0.5*x+(-16625*0.5*x^2+12125*x^3-8750*x^4+2500*x^5)/3

  LA<-c(La0, La1, La2, La3, La4, La5)

  ombar=sum(Y*LA)

  absum=nH*(1.028+0.75*x)+2

  if (xs>=0.5){
    a=ombar*absum
    b=(1-ombar)*absum
    omegabar=ombar
  } else {
    a=(1-ombar)*absum
    b=ombar*absum
    omegabar=1-ombar
  }

  na=a-1
  nb=b-1

  list(na=na,
       nb=nb,
       omegabar=omegabar)
}

McNemar

Condition Effects

Contrasts



danbarch/dfba documentation built on Jan. 30, 2024, 6:51 p.m.